Succinct Data Structures for NLP-at-Scale
نویسندگان
چکیده
Succinct data structures involve the use of novel data structures, compression technologies, and other mechanisms to allow data to be stored in extremely small memory or disk footprints, while still allowing for efficient access to the underlying data. They have successfully been applied in areas such as Information Retrieval and Bioinformatics to create highly compressible in-memory search indexes which provide efficient search functionality over datasets which traditionally could only be processed using external memory data structures. Modern technologies in this space are not well known within the NLP community, but have the potential to revolutionise NLP, particularly the application to ‘big data’ in the form of terabyte and larger corpora. This tutorial will present a practical introduction to the most important succinct data structures, tools, and applications with the intent of providing the researchers with a jump-start into this domain. The focus of this tutorial will be efficient text processing utilising space efficient representations of suffix arrays, suffix trees and searchable integer compression schemes with specific applications of succinct data structures to common NLP tasks such as n-gram language modelling.
منابع مشابه
اثربخشی آموزش گروهی برنامهریزی عصب زبانشناختی بر میزان امید و کیفیت زندگی کودکان سرطانی
Objectives This study aimed to examine the effect of Neuro-Linguistic Programming (NLP) on the hope and quality of life in children with cancer. Methods The study design is quasi-experimental study with pretest, posttest, follow-up and control group. Study population consisted of children (male and female) with cancer at AminrKabir Hospital and Tabassom Cancer Support Community in 2016 who ap...
متن کاملSuccinct data structures for assembling large genomes
MOTIVATION Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and f...
متن کاملSpace-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences
Rank & select data structures are one of the fundamental building blocks for many modern succinct data structures. With the continued growth of massive-scale information services, the space efficiency of succinct data structures is becoming increasingly attractive in practice. In this paper, we re-examine the design of rank & select data structures from the bottom up, applying an architectural ...
متن کاملSpace-Efficient, High-Performance Rank & Select Structures on Uncompressed Bit Sequences
Rank & select data structures are one of the fundamental building blocks for many modern succinct data structures. With the continued growth of massive-scale information services, the space efficiency of succinct data structures is becoming increasingly attractive in practice. In this paper, we re-examine the design of rank & select data structures from the bottom up, applying an architectural ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016